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We congratulate the author on an informative and 
thought-provoking discussion on a topic of broad 
interest to the statistics community: the fitting of 
models to data collected through complex surveys. 
The number of papers written on this topic, whether 
from a model-based or design-based perspective, is 
substantial and goes back at least to Konijn (1962). 
This topic has led to some disagreements between 
those advocating that the design best be ignored 
when the primary interest is on the characteristics of 
the model, and those stating that the design cannot 
be ignored. More recently, both sides of this discus- 
sion have moved to something approaching a con- 
sensus, with those favoring a model-based approach 
acknowledging the need to account for nonignorable 
designs in the model fitting, while the traditional 
design-based view has been extended to explore cer- 
tain circumstances under which it is appropriate to 
ignore the design. 

The current article is an excellent example of those 
recent discussions of why the design needs to be ac- 
counted for in modeling, and how this can be done 
in practice. The importance of fully accounting for 
the design by incorporating all relevant interactions 
provides a good motivation for the discussion of the 
range of methods in the article. It also stresses other 
aspects of importance to people working with survey 
data, in particular the desirability of maintaining 
scale/location invariance and linearity of the model- 
based estimators. This ensures consistency of esti- 
mates for different variables in the survey, as well as 
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additivity over domains within the population. (As 
an aside, the poststratified estimator arising from 
logistic regression in Section 3.2 can be modified to 
yield approximate weights by the method proposed 
in Wu and Sitter, 2001.) 

The article mentions a number of disadvantages 
of design-based (weighted) model fitting and infer- 
ence. Weights are viewed as complicated and mys- 
terious, in the sense that the modeler often does not 
know how they were constructed and hence might 
not want to rely on them when it comes to model 
specification and estimation. Estimation, and espe- 
cially variance estimation, are viewed as more cum- 
bersome under the design-based paradigm compared 
to a model-based analysis. In what follows, we will 
argue that a weighted analysis offers some distinct 
advantages and might actually reduce the complex- 
ity of the analysis in many cases, at least from the 
perspective of a statistician interested in using pre- 
viously collected and weighted survey data to fit a 
model. 

A key feature of the design-based paradigm (broad- 
ly speaking) is that it makes it possible to separate 
design and postsample adjustments from data anal- 
ysis. Individuals tasked with creating survey weights 
are typically within the organization collecting the 
data, and will be referred here as "the survey statis- 
ticians." They have knowledge of the sampling de- 
sign and have access to detailed information on the 
nonresponse characteristics of the sample and to rel- 
evant auxiliary information. Based on these sources 
of information, they develop a set of survey weights 
(and sometimes also produce sets of replication 
weights for variance estimation). As noted in the 
article, these weights are often much more compli- 
cated than simple inverses of inclusion probabilities, 
and in fact reflect the best effort on the part of the 
survey statisticians creating the weights to account 
for nonresponse and incorporate potentially useful 
population-level information. These weights are ap- 
pended to the dataset, which is then made available 
to individuals interested in analyzing those data. 
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These individuals will be referred to as "the data 
analysts." 

From the perspective of the data analysts, using 
these weights is convenient in the sense that they 
provide a simple way to account for the way the 
data were obtained, without requiring the data an- 
alysts to replicate many of the tasks of the survey 
statisticians. Overall, this "division of labor" allows 
both sets of statisticians to focus their efforts on the 
portion of the overall problem of most immediate 
interest to them, and for which they have both the 
expertise and the information available to best per- 
form the required tasks. 

As noted by a number of authors (e.g., Pfeffer- 
mann, 1993), performing a weighted analysis for a 
model using inverses of the inclusion probabilities 
ensures that the resulting estimators are design con- 
sistent for population-level quantities, which are 
themselves model consistent for the model parame- 
ters of interest. When the weights also include non- 
response adjustments (usually by way of poststrat- 
ification) as well as other calibration adjustments, 
results for descriptive statistics, including those dis- 
cussed in Sarndal and Lundstrom (2005), show that 
the estimators are consistent under the joint design- 
response mechanism. While these results are expect- 
ed to continue to hold when model parameters are 
targeted rather than finite population means, there 
is currently only limited formal theory exploring this 
topic. 

The division of labor between the survey statis- 
ticians and the data analysts has some additional 
advantages. While the former typically have access 
to detailed unit-level information and can use that 
information in the construction of the weights, con- 
fidentiality issues often preclude such access for the 
latter. For instance, in the Social Indicators Sur- 
vey considered in the Gelman article, avoiding the 
weights required knowledge of the number of adults 
and the number of phone lines in the household 
of each respondent, as well as various other demo- 
graphic variables. It is easy to envision situations 
where at least some of these variables are not made 
available to the data analysts in order to protect the 
confidentiality of the survey respondents. In such sit- 
uations, the data analysts could still try to build 
a model that incorporates the design effects, but 
might end up only being partly successful because 
some influential variables are not available. 

Another consideration is the fact that large-scale 
surveys often involve complex stratification and post- 
stratification schemes, multiple phases and/or stages 



of selection, imputation for item nonresponse, etc. 
Accounting for all these factors, even if the needed 
sources of information are available to the data an- 
alysts, would require significant time and effort on 
the part of the data analysts and result in mod- 
els that might be unwieldy and difficult to inter- 
pret. 

One point noted in the Gelman article is that 
variance estimation for weighted estimators is more 
cumbersome than for fully model-based estimators. 
To a large extent, this is indeed the case, but a num- 
ber of solutions are available. For specific models 
(e.g., linear or logistic regression), commercial soft- 
ware programs such as SAS are increasingly provid- 
ing design-based estimation procedures, so that with 
access to the weights and some basic information 
about the design (e.g., stratification information and 
primary sampling unit identifiers), it is possible for 
the data analysts to perform design-based inference 
for model parameter estimators. An alternative pro- 
cedure, already alluded to earlier and often used for 
large-scale surveys, is for the survey statisticians to 
provide sets of replication weights (e.g., jackknife or 
bootstrap replicates). In that case, variance estima- 
tion for the weighted estimates is a simple matter 
of recomputing the estimates for each set of repli- 
cate weights and calculating the variability among 
the replicate estimates. 

Incorporating the design and nonresponse char- 
acteristics of a dataset through explicit modeling is 
a statistically valid and conceptually attractive ap- 
proach to solving the nonignorability problem. It has 
the advantage of being easily integrated into the set 
of tools most familiar to data analysts, but, as ex- 
plained in this interesting article, it requires knowl- 
edge of the relevant variables and has to be done 
carefully. Performing a design-based analysis with 
the weights provided as part of a survey dataset is 
attractive as well, because it is generally applicable 
even without detailed knowledge of the way the data 
were obtained. 

In closing, we would like to suggest a number of 
possible developments that would help make data 
analysts more comfortable with these weighted anal- 
yses. While weight construction is likely to remain 
to a large extent an "art," more transparency in how 
weights are constructed might alleviate some of the 
discomfort on the part of data analysts having to 
rely on the work of survey statisticians as a building 
block in their own analysis. A related development 
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might be more education and training in the inter- 
pretation of results of weighted analyses for nonsur- 
vey statisticians and in methods for doing inference 
for design-weighted model estimates. On the survey 
statistics side, we would like to encourage the in- 
vestigation of the statistical properties of weighted 
estimators for model parameters that explicitly ac- 
counts for the multiple adjustments typically made 
to survey weights, including calibration and nonre- 
sponse weighting. 
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